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Masculinity and Femininity: 
A Bipolar Construct and Independent Constructs 

ABSTRACT 

The present investigation is a reanalysis of data from Antill and Cunninghaa 
(1979; 1980; Marsh, Antill i Cunningham, 1987) consisting of responses to 
five Masculinity-Femininity (MF) instruments, two esteem instruments, and 
two social desirability scales. Correlations between M and F for the 3 
instruments varied from ,23 to approximately -1.0; support for 
distinguishable (non-bipolar) M and F factors was found for 4 of the 
instruments. Applying confirmatory factor analysis (CFA) and hierarchical 
CFA (HCFA), the present study examined the dimensionality of MF and the 
influence of method/halo effects in responses to specific instruments. The 
best fitting iKXlel identified three higher-order factors; in support of 
traditional personality theories one factor was a bipolar MF construct, but 
in support of androgyny theory the other two factors were distinguishable M 
and F factors. The factor structures were reasonably invariant for men and 
women, and methodological implications of this important finding were 
examined. In subsequent analyses, the higher-order MF factors were related 
to esteem, social desirability, and gender in order to further test 
interpretations of the MF factors. 
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Hatculinity «nd FMininity 1 
Masculinity and FMininity: 
A Bipolar Construct and Independent Constructs 
The present investigation is a reanalysis oi data from Antill and 
Cunninghaa (1979; 1980; Marsh, Antill & Cunningham, 1987) consisting of 
responses to five Masculinity-Femininity (MF) instruments, two Self esteem 
instruments, and two Social Desirability instruments. The purposes of the 
present investigation are to: a) examine the dimensionality of MF; b) 
examine the relation of derived MF factors to other constructs (esteem, 
social desirability, and gender); c) demonstrate the implications of testing 
the dimensionality for men and women separately, for the total-group 
covariance matrix, or for the pooled within-group covariance matrix that 
removes the effect of gender; and d) demonstrate recent advances in the 
application of confirmatory factor analysis (CFA) and hierarchical CFA 
(HCFA) to such problems. 

Ibfi QE Construct and Its Relation to Esteem and Socigl De§iribil.it^ 
Itlg BiSfDSignality of MF 

Virtually all researchers prior to Constantinople's 1973 review and 
many current personality inventories assume that M and F are the end-points 
of a single, bipolar dimension. This implies that the correlation between M 
and F is close to -1.0. More recently, androgyny researchers have argued 
that it is logically possible to be both M and F, and the existence of both 
in the same person has been labeled androgyny. The two key hypotheses of 
androgyny theory are that: (a) M and F are distinguishable dimensions and 
(b) individuals high on both M and F are mentally healthier and socially 
more effective. A considerable and growing body of research has been 
directed at contrasting these two apparently opposing views of M and F (see 
Marsh It Myers, 1986, for a review). 

In support of androgyny theory, androgyny researchers have typically 
found that MF correlations (i.e., correlations between M and F) differ 
significantly from -1.0. However, Marsh and Myers (1986) found that MF 
correlations for different instruments varied from moderately positive to 
close to -1.0. They showed how such differences were logically consistent 
with the design of the instruments. For example, the use of only socially 
desirable attributes to represent M and F may produce a response bias that 
results in « near-zero or positive MF correlation that is consistent with 
androgyny theory. Alternatively, the use of logically opposed Items to 
raprMWit H and F Is likely to result In « substantially negative MF 
correlation that Is consistent with a bipolar HF. Also, In an exploratory 
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Hasculinity and FMininity 2 
factor analysis of the original BSRI itMS, Pedhauzur and Tetenbaua (1979) 
found that responses to the adjectives "easculine" and "feeinine" were 
substantially negatively correlated and formed a two-iteii bipolar factor. In 
their factor analysis they reported four orthogonal factors defined by 
traditionally feminine items, traditionally masculine itc^s, the bipolar HF 
factor defined by the "masculine" and "feminine" adjectives and an 
additional factor which they called self-sufficiency. On the basis of this 
four-factor solution, they concluded thatj -The fact that the traits 
Masculine and Feminine describe a separate bipolar factor also casts doubt 
on the validity of the classification of the remaining items as masculine 
and feminine** (p. 1012). Whereas most research such as that summarized here 
has sought to establish the structure of MF in separate analyses of 
individual Instruments, the purpose of the present investigation is to 
establish the structure of MF across responses to five different instruments. 
!£ dS ^SSi&ysCSi on the Fi\m MF Instruments Used Her e^. 

The five MF instruments considered in the present investigation 
represent very different approaches to the measurement and conceptualization 
of the MF construct. The MF scale rrom the California Psychological 
Inventory (CPI; Megargee, 1972), like many traditional personality 
instruments, contains items that maximally differentiate men and Momen so 
that M and F scores are highly correlated with gender. Because biological 
gender is bipolar, this type of scale is likely to also be bipolar. In an 
alternative approach, the Comrey Personality Scales (CPS; Comrey, 1970; also 
see Marsh, 1983) is based on distinct item clusters designed to represent 
componmnts of MF on a logical /theoretical basis that Mere substantiated by 
factor analysis. Consistent with the CPS assumption of bipolarity, logically 
opposed items were used to reflect the M and F endpoints within each of the 
item clusters. Hence both these traditional personality instruments 
conceptualize MF to be a bipolar construct. 

The Bem Sex Role Inventory (BSRI; Bem, 1974) is based on socially 
desirable items empirically rated to be more c;|ei^|r4^tf for one sex or the 
other. In contrast, the Personal Attributes Questionnaire (PAQ; Spence, 
1984) is based on socially desirable items rated to be more t^ftUil of one 
sex or the other. Spence and Bem also offer theoretical distinctions between 
the two instrummnts such as the generality of the H and F constructs 
inferred by the two instruments. Spence (1984) emphasized that PAQ measures 
two trait clusters that can be labeled dominance/self •assert iveness (PAQ H) 
and nurturance/intmrpersonal orientation (PAQ F). Nevertheless, both PAQ and 
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HMculinity and FMininity 3 
BSRI arc basml on .ocially dasirabla attribute, both result in 
diatinguiahable (non-bipolar) construct,, and PAO score, are highly 
correlated nith BSRI scores (Cook, 1983). Frequent criticisms of these 
instruments include, a) their reliance on socially desirable attributes; b) 
their atheoretical approach to instrument construction, and c) the limited 
scope of the H and F traits based on them (for further discussion see Cook, 
19855 Kelly fc Worell, 1977, Locksley fc Colten, 1979, Marsh Hyers, 19865 ' 
Pedhazur I. Tetenbaum, 1979; Spence, 1984). Nevertheless, these tno scales 
continue to be the most frequently used in androgyny research. CThe original 
version of PAO also included a bipolar MF scale and subsequent versions of 
PAQ included H and F scales derived from negatively valued items, but these 
additional PAQ scales Here not considered here. 3 

The ANDRO scale (Berzins, Melling & Metter, 1978) Has developed by 
selecting existing items from the Personality Research Form (PRF; Jackson, 
1967) according to their sex-typed desirability and their consistency nith 
the content themes in the BSRI. Ratings by university undergraduates Here 
used to corroborate the sex-typed desirability of the items. The rationale 
for ANDRO Has to develop an instrument consistent nith the BSRI based on PRF 
responses so that the androgyny construct could be examined in the Hide 
range of studies that have used, and Hill use, the PRF. Hence, the 
conceptualization of the ANDRO MF scales, though based on items from a 
traditional personality instrument, is similar to BSRI and PAO. 

The five MF instruments considered in the present investigation differ 
substantially in their conceptualization and design. Hence, an important 
question is the extent to nhich they measure common M and F traits. 
Multitrait-multimethod (MTMM) analysis (Campbell & Fiske, 1939; Marsh, in 
press, Harsh & Hocevar, 1983) is ideally suited to examine this question. 
Hithln this MTMM perspective there are tno traits (M and F) and five methods 
(the five MF instruments). The substantive questions to be examined ares a) 
to Hhat extent can the M and F scores from each instrument be combined to 
form a global M, a global F, or a bipolar global MF? b) nhat are the 
relations among these global measures? and c) nhat «-e the influences of 
••thod effects that are idiosyncratic to particular instruments? 

BiliUfloi Hisua MF RssBSQUi iod QiJUsL Cfioittustii. 

El&llli. Androgyny theory posits that both M a.id F, or perhaps the M x F 
cross-product, should contribute positively and uniquely to esteem. 
EMtMsive reviews have examined the MF/esteem relation and theoretical 
WMtols of this rrtation (e.g., Hall I, Taylor, 198S| Lubinski, T«lleg«n & 
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fcitchar, t983| Harsh, 1987} Marsh, Antill & Cunningham, 1987; Spence, 1984; 
NhitUy, 1983). However, eepirical findings indicate that whereas the 
contribution of H is substantial and positive, the unique contribution of F 
is nil or negative (but see Marsh, 1987) as is M x F. Furthermore, the 
effects of none of these variables seea to depend on gender. Marsh, Antill 
and CunninghaM (1987) tested various Models of tht MF/esteem relation with 
the data considered here. They found that the unique contribution of M to 
esteee was consistently nore positive than that of F which w«.i «*ther nil or 
negative, did not vary with gender as posited by sex-typed models, and did 
not interact with F as posited by interactive androgyny models. 

isciil desirabili.^ Social desirability is an inferred response bias 
or method effect whereby individuals respond to the desirability an item 
instead of or in addition to the specific item content. Methodological 
issues related to social desirability are important in androgyny research 
(Marsh, 1987; Marsh, Antill I. Cunningham, 1987). The MF correlation is 
probably influenced by the social desirability of the H and F items. If 
both M and F items are consistently positive, or consistently negative in 
terms of social desirability, then the MF correlation is likely to be more 
positive than if the M and F items are neutral. Furthermore, if H and F 
items are consistently high in terms of social desirability, then the 
HF/esteem relation may be explicable in terms of the social desirability of 
the MF iteM instead of their specific H or F content. Finally, if the 
social desirability of M items differs substantially from that of F items, 
\hmn the differential influence of M and F to the prediction of desirable 
outcomes may be due to differences in social desirability instead of 
differences in the H and F content of the items. 

The influence of social desirability is often viewed is an undesirable 
response bias or source of invalidity, but this view may be too simplistic. 
For example, high scares on both esteem and social desirability measures ere 
typically inferred from positive responses to socially desirable attributes 
aiMl negative responses to socially undesirable attributes so that esteem and 
social desirability responsM should be substantially correlated. Harsh, 
Antill and Cunningham (1987) found esteem to be more positively correlated 
with M than F, social desirability was aore positively correlated with F 
thm M. They speculated that esteem items may be stereotypically more 
aasculine i^ereas social desirability may be stereotypically more feminine. 
Consistent with this explanation, males had higher estses scores than 
fsMlts, but fsMles had highw- social desirabilitr icores than sales. 
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HsAKi. Not surprisingly, «ales tend to have higher M scores than 
fetMles whersas fenales tend to have higher F scores than males. The size 
of these sex differences, however, varies substantially depending on the MF 
instrument. Harsh and Myers (1986) found that responses to instruments 
designed to infer a bipolar ^F construct were more substantially related to 
gender than were responses to instruments designed to measure indeprjndent H 
and F constructs. The relation of gender to M and F scales also complicates 
the examination of the factor structure of MF responses. Important questions 
to be addressed are whether factor structures are invariant for males and 
females and whether the influence of biological gender is a valid source of 
influence in the formation of M and F. 

dfthod 

Safnglg and Materials 

The sample, materials, and the collection of data are described in more 
detail by Antill and Cunningham (1979; 1980). Briefly the subjects were 104 
male and 133 female college students who completed: a) the Bern Sex Role 
Inventory (BSRIj Bem, 1974) consisting of 20 M, 20 F, and 20 neutral (Social 
Desirability) adjectives; b) the Personal Attributes Questionnaire (PAO; 
Spence, 1984) consisting of 23 M and 18 F adjectives; c) the ANDRO 
instrument (Berzins, Welling 8. Wetter, 1978) consisting of 29 M and 27 F 
items from the Personality Research Form (PRF; Jackson, 1967) and the Social 
Desirability scale from the PRF; d) the Femininity scale of the California 
Psychological Inventory (CPI; Gough, 1937) consisting of 21 M items and 17 F 
items; e) the Masculinity versus Femininity scale of the Comrey Personality 
Scales (CPS; Comrey, 1970; also see Marsh, 1985) consisting of 10 H and 10 F 
items; f) the Feelings of Inadequacy Scale (Janis & Field, 1959, as revised 
by Eagly, 1967) consisting of 20 esteem items; and g) the Self -Acceptance 
Scale (Berger, 1932) consisting of 36 esteem items. The first three MF 
■masurM (BSRI, FAQ, ANDRO) were explicitly designed as androgyny measures 
and provide separate M and F scores. The CPI and CPS were designed to infer 
« bipolar HP, but separate M and F scores can be constructed by scoring M 
and F it«n separately. 

Ett lt B t OiCy aoaljiSgSt Psychometric properties of the self -report scales 
are suMarizod in Table l. For all five MF instruments, M, F, and bipolar MF 
(M it«M scored positively and F itmas scored negatively) have at least 
MdMt coafficiont alpha ostiaatM of roliability and corrolato 
■ubmtmtitlly Nitli gwNtar in th« mpxtwl direction. Tho bipolar IT froo 



8 



Masculinity and FMininity 6 
tha CPI that nas originally dvviMd to dif f arantiata batiwan aalas and 
faMlas, and it corralataa Mith gandar at a laval closa to the reliability 
o# tha acala. MF corralations for tha othar instruaents are sealler but 
still substantial. The correlations between H and F scales vary froa 
aodcstly positive to close to -i.O. The negative correlation for the CPS 
approaches the reliability of its M and F scales in a Manner that is 
consistent with its bipolar conceptualization of HF. Actually, after 
correction for unreliability the MF correlation for the CPS is sligf.dy nore 
negative than -1.0. Consistent with their design, the M and F scores for the 
BSRI and PAO are positively correlated Mith social desirability. In contrast, 
social desirability is less positively correlated with the M and F scales froe 
the CPS and the ANDRO, and n egatively cor related Mith the CPI scales. 

Insert Table t About Here 
For all five HF instrueents, M scores are aore positively correlated 
with esteea than are F scares so that all five bipolar MF scores are 
positively correlated with esteea (Table 1). Though not reported here, other 
analyses of this data (Harsh, Antill and Cunninghaa, 1987) indicated that 
for all five HF instruaents the contribution of F after controlling for H 
Mas nil or negative, the H x F crossproduct did not contribute to esteea 
beyond the contribution of H and F, and the effects of H, F and H x F did 
not interact with gender. Controlling for social desirability did not alter 
the general pattern of results, and all interaction effects -~ H x F and 
those involving gender — Mere still nonsignificant. 

CBQliCfiStacy F§ctPr dQiiyUl iSE&SU For present purposes, iteas froa 
each of the 14 (S H, S F, 2 social desirabi: ity, and 2 esteea) scales Mere 
randoBly divided into thirds (in subsequent discussion these are called itea 
parcels or siaply parcels). A covariance aatrix derived froa these 42 (14 x 
3) parcels for all subjects Mas the basis of tha CPAs. The large nuaber of 
iteas in these 14 scales — 251 — precluded the analysis of responses to 
individual iteas. Furtheraore, there are iaportant advantages to analyzing 
responses to subscale scares instead of iteasi (a) parcel scores typically 
have greater reliability and generality, (b) response biases and other 
characteristics that are idiosyncratic to individual iteas are likely to 
have less influence, (c) the ratine of aeasured variables to inferred 
factors and to estiaated paraaeters are increased, and (d) distributions of 
the aeasured variables are less likely to cause probleas for factor analyses 
parHciilarly »4)en itea responses are dichotoaaus. 
BoK's g (8P88, 1986) MS used to test the sqiMllty of the 
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MMculinity «nd FMininity 7 
variance/covariancB matrices for males and females. In results discussed 
Utter in more detail, the tMo covariance matrices did not differ 
significantly (b > .05) for the 30x30 matrix based on the 30 MF parcels 
derived from the 3 MF instruments, or the 42x42 matrix that included the 6 
esteem and 6 social desirability parcels. Based in part on these findings, 
the focus of subsequent results Mas on CFAs conducted on the total group 
covariance matrix. 

In preliminary, unreported CFAs, various one- and two-factor models 
•^e fit to responses from each instrument separately. These results 
indicated that two-factor (M and F) solutions fit responses to all but the 
CPS instrument substantially better than did one-factor solutions. 
Correlations between the n and F factors in the two factor solutions were 
similar to the MF correlations between scale scores that have been corrected 
for unreliability (Table 1). In the first set of analyses considered here, 
models were fit to responses from all five MF instruments. In subsequent ' 
analyses, relations between MF responses and other variables were 
considered. The CFAs were conducted with LISREL V (Joreskog & Sorbom, 1981). 
Introductions to the use of CFA and LISREL are available elsewhere (e.g., 
B-gozzi, 1981; Joreskog, 1981; Joreskog & Sorbom, 1981; Long, 1983; Marsh, 
1985; Marsh & Hocevar, 1983; 1984; 1985; 1988; Pedhauzur, 1982) and so are 
not presented in detail. The details of these models are presented in the 
Results section (also see Appendix I). 

In CFA there are not well-established guidelines for testing goodness 
of fit. The general approach, and the one used here, is to: a) examine 
parameters in relation to substartive issues; b) evaluate the overall 
goodness uf fit in terms of statistical significance and in comparison to 
alternative models; and c) evaluate subjective goodness-of-f it indicators 
such as the X /df ratio and the Tucker Lewis Index (TLI; Marsh, Balla & 
McDonald, 1988) and to compare values from alternative models. 

A related proble.t is the occurrence of Heywood cases, parameter 
estimates that are outside of the range of allowable values, «,ch as 
residual variance estimates that are negative. Heywood cases are more likely 
Hhen the sample size is small relative to the number of parameters that are 
Mtimated, when there are few indicators for each factor, and when the 
factor structure is complex (e.g., variables are associated with more than 
one factor). Haywood cases are likely to represmt sampling error when the 
confidwice Interval about the improper parameter •stimate contains proper 
valuM and th« size of its standard error is rMnonable (G«-bin« i Amtarton, 
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1987| Van Driel, 1978). For example, for simulated data tested with the true 
population aodel, Anderson and Gerbing (1984) found that 25% of the 
solutions contained Heywood cases. However, the occurrence of such Heywood 
cases had little effect on parameter estimates for other factors or on 
goodness-of-fit (Gerbing 8, Anderson, 1987). Hence, improper parameter 
estimates are unlikely to substantially affect substantive conclusions as 
long as confidence intervals about improper estimates contain proper values, 
and standard errors are reasonable. In alternative approaches to this problem 
(e.g., Dillon, Kumar !< Mulani, 1987) it is possible to artificially restrict 
the solution space so as to exclude improper solutions or to simply fix the 
offending parameter estimate to have a value on border of the permissible 
solution space (e.g., when variance estimates are negative they can be fixed 
at zero or at a small positive value). These strategies, however, merely make 
the problem less obvious and rarely have any substantive effect on the 
results (see Marsh, 1988). Heywood cases may also be symptomatic of poor 
models, particularly when parameter estimates are far outside of the range of 
permissible values or when the standard errors for the offending parameters 
are very large. The problem, of course, is how to determine whether Heywood 
cases are due to a poor model or to sampling fluctuations. Dillon, Kumar and 
Mulani (1987, p. 134) offered the following advice: "if the model provides a 
reasonable fit, the respective confidence interval for the offending estimate 
covers zero, and the magnitude of the standard error is roughly the same as 
the other estieated standard errors, the Heywood case is likely to be due to 
•aepling fluctuations." To this good advise eight be added the suggestion 
that the rewlts are substantively reasonable. Even though Heywood cases are 
coeMon in CFA studies, their occurrence should alitays be noted and should 
dictate caution in subsequent interpretations. 

BE EtCtaCl l Offrrjd ecCfiSS flU ElMl tE lostrueents. 

Ibl f t Cf t- grtftC flSiSL fSdlU MF factors described here are based on 
all 30 MF parcels representing the five MF instrueents (i.e., 3 M and 3 F 
parcels for each MF instrueent). Model I (Tables 2 and 3, considering only 
the total group (T6) analyses for now) is a first-order eodel. It explains 
responses to the 30 parcels in teres of 10 first-order factors ~ an M and 
an F factor for each of the five instruaentk. The factor structure is well 
defined in that all factor loadings are statistically significant, each of 
the 10 factors accounts for a significant portion of the variance, and the 
♦it of Model 1 (Table 4) is reawmable. This firtt-wdir lodel ts iaportant 
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b«cauM iU goodness of fit Mtablishes an upper-li.it for the higher-order 
liodeU based on the same data (i.e., the nodels are nested) and because 
higher-order models are based on it. The purpose of higher-order eodels is 
to describe correlations among the first-order factors in terms of higher- 
order factors, and so the correlations among the 10 first-order factors in 
Hodel 1 (Table 3) are particularly important. 

I-!?IlLI?"!!?_?i ^ ^ * About~Here 
lbs diet! eerSEgctiifei The lo"f irst"ord^7actorrin Model 1 correspond 
to M and F traits inferred from each of five instruments. The 
correlations among these first-order fetors (Table 3) represent a MTMH 
•atrix in which the multiple traits are H and F, and the multiple methods 
are the five MF instruments. In MTrtI studies it is typical to assess * 
convergent validity, discriminant validity, and method/halo effects. 
Convergent validity is agreement betneen measures of the same trait assessed 
by different methods. In HTMM terminology, the 10 correlations among the M 
factors (.46 to .98; median = .59) and the 10 correlations among the F 
factors (.23 to .80; median = .64) are convergent validity coefficients. 
Discriminant validity refers to the distinctiveness of the different traits, 
the ability to distinguish H from F. Hethod/halo effects are undesirable 
biases that are idiosyncratic to a particular method of measurement. 
Because the five MF instruments were constructed differently, particularly 
with regard to the social desirability of item, it is likely that method 
effects do exist and that these method effects are related to social 
desirability. For example, BSRI and the PAO instruments contain only socially 
desirable characteristics, and so it is likely that correlation. betNeen M and 
F will be biased by social desirability when based on these instruments. 

MTMH iMtrieei havt traditionally been exaeined according to guidelines 
such as those developed by Campbell and Fiske (1959| al-o see Marsh, in 
press). Mhereas the guidelines are useful (Marsh, in press), they have been 
criticized and many researchers advocate the use of CFA for MTMM data (e.g, 
Bagozzi, 1980| Joreskog, 1974| Kenny, 1979| Marsh, in press; Harsh I. 
Hocevar, 19B3| Schmitt I, Stults, 1986| Hidaean, 1985). In the CFA approach 
factor, defined by the multiple indicator, of the Mme trait wpport the 
conetruct validity of trait., Nherea. factor, defined by variable. 
repreMnting the .aae Method argue for eethod/halo effect.. TheM 
reMarcher. typically recoMend that there .hould be at lea.t three trait, 
and three eethod. m that each factor i. defined by at least thra* MMured 
variablM. in the preMnt invMtigation thwe ir« only two tr«itt, but Kwiny 
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(t979| also set Marsh, 1988| in prMs) dsscribad an alternative 
paraeeter^zation of the HTMM model for this situation that is used here. In 
this parameterization, method variance is inferred from correlated residuals 
for variables that share the same method of measurement (see Appendix I). 
Marsh (1988; in press) examined this alternative parameterization of method 
effects and recommended it for all MTMM studies even when there are three or 
more .rai^.s and methods. 

In most applications of CFA to MTMM data (e.g., Widaman, 1985) trait 
and method effects are inferred on the basis of correlations among scale 
scores that represent each trait/method combination. This could be 
accomplished here by taking an average of the parcels (or, equiv-lently, the 
original items) used to define the M and F scores for each instrument, and 
using these 10 scale scores as the starting point of subsequent analyses. 
This 10x10 matrix of correlations among scale scores would be similar in 
•any respects to the corresponaing matrix of correlations among the 10 
firs.t-order factors (i.e., latent constructs) in Table 3. The two 
correlation matrices would differ in that: (a) the latent constructs are 
optimally weighted combinations of measured variables whereas corresponding 
scale scores are notj (b) the latent constructs are corrected for 
measurement error whereas the corresponding scale scores are notj and (c) 
the fit of the model used to derive the latent constructs (i.e., Model 1) is 
explicitly tested as part of the analysis whereas the implicit factor 
structure used to compute the scale scores is typically untested. Marsh &nd 
Hocevar (1988) noted these advantages and argued that it is better to infer 
trait and method effects on the basis of correlations among latent traits 
instead of correlations among scale scores. They described how this could be 
accomplished with the use of HCFA. 

lbs dSEA SBBCSSSb ia tUm conceptually, a second-order factor 

analyses is like conducting two separate factor analyses. The first factor 
analysis is performmj on relations among measured variables (iten or parcel 
scores) to obtain first-order factors. The second factor analysis is 
performed on relations among the first-order factors to obtain second-order 
factors. In the HCFA approach to higher-order factor analysis, both the 
first- and second-order factors are actually estiaated simultaneously. As 
already noted, however, it is useful to carefully examine the fit and 
parameter estimates for the first-order model before proceeding to the 
higher-order models. This approach to HCFA Is described In greater detail by 
nwwh (19831 19B7a| 1987b| Marsh li Hocevar, 1983) end applied to HTMK data 
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by Marsh and Hocevar (1988). 

In the HCFA approach to MVMM data (Marsh & Hocevar, 1988) each 
trait/method combination is represented by a latent construct, one of the 
first-order factors in Model 1. Trait ana method effects are inferred on the 
basis .. second-order factors. Marsh and Hocevar (1988) described how the models 
typically used to test for these effects in the CFA of MTMM data (e.g. , Marsh, 
in press; Widaman, 1985) can easily be translated into seer .d- order models so' 
long as there are multiple indicators of each trait/method combination. 

The second-order factor models considered here are illustrated in 
Figure 1 (also see Appendix I), in various models the 5 first-order M 
factors are used to define a second-order global M factor (GM in Models 3, 
4, 5 and 6), the 5 first-order F factors are used to define a second-order 
global F (6F in Models 3,4,3 and 6) factor, and all 10 first-order factors 
are used to define a global trait factor (GMF in Models 2, 5 and 6). An 
essential difference between these models is the number of higher-order 
factors that are hypothesized. 

In Models 4 and 6 method effects are tested by allowing correlations 
between the residual variance estimates (variance unexplained in terms of 
higher-order factors) of first-order factors derived from the same MF 
instrument. That is, a method effect is inferred when the correlation 
between two different traits (M and F) derived by the same method 
(instrument) is idiosyncratic to that method. For example, if the BSRI M and 
the BSRI F scores are more highly correlated than can be explained in terms 
of the correlation between global M and global F, a method effect is 
inferred. Thi. r.pre«wtation of method effects is particularly useful when 
there are only two traits associated with each method of measurement (Kenny, 
1979, Marrt), in press). Hhen each method of measurement is represented by at 
lea.t three trait., method effect, can also be represented as method factors 
(see Har.h, in pre., for a compariMn of the two approaches) . 

In HCFA. relation, among f ir.t-order factor, are fixed to be zero and 
thwe relation* are reprewntwi in term, of high«-order factors. For 
•Kample. Model 2 po.it. that all the relation, among the fir,t-ord«- factor. 
(Table 3) can be explainwl in tmrm% of ju.t on. wcond-order factor (GMF). 

2 ha. fewer ..timat«l paramet*-.. it i, more parsi««,iou. than 
the corr..ponding fir.t-order Model 1. It i. ii^ortant to wnpha.iz. that 
Hodel. 2-6 po.iting high«-ord«- factor, are all nMt«l und«- the fir.t- 
ord«- model (Model i) no that none can fit ttm data any bettw- than nodal 1. 
In thi. raapact th. fit of tha firat-ordar aod.1 rapratant. an optiau. or 
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target for the iit oi all th« higher-order models.* The higher-order models 
are, however, more parsimonious in that they use fewer parameters to fit the 
data. Thus, to the extent that the fit of a higher-order model approaches that 
of the corresponding first-order model and the parameter estimates support the 
posited constructs, then there is support for the higher-order model. 

iQfgrrtng glgbjl M IgMl^ global F (GFl^ and gLgbai MF (GMFl factors^ 
HCFA models in Figure 1 (also see Appendix I) posit second-order trait 
factors (6M, 6F, GMF) anJ method effects (correlated residuals) to explain 
correlations among the first-order factors. For now, only models fitted to 
the total group covariance are considered. The ability of alternative models 
to fit the data and their parameter estimates are used to infer the 
existence of trait and method effects. In Hodel 2 (Figure lA) a single 
higher-order factor is posited to account for all the covariation among the 
10 first-order factors. If first-order M and F factors consistently load in 
the opposite direction on the second-order (GMF) factor and Model 2 is able 
to fit the data, then the results would support the bipolarity of MF. 
Inspection of the higher-order factor loadings (not shown) demonstrated that 
this factor was bipolar, but the model fits (TLI=.693 in Table 4) the data 
more poorly than models positing two or three higher-order factors. Much of 
the covariation amcr.-j first order factors is unexplained by global GMF. 

In Model 3 (Figure IB) two higher-order factors, GM and GF, are 
posited. This two-factor model provides a better fit (TLI=.755) than the 
one-factor model. Also, the modest correlation between GF and GM (-.23) 
indicates that GF and GM are distinguishable (i.e., not bipolar). In Model 4 
(Figure IC), five correlated residuals are added to the Model 3 to test for 
•ethod effects. The inclusion of the correlated residuals substantially 
Improved the fit (TLI=.B03), implying that there are method effects. 
Furthermore, the correlation between GF and GM in Model 4 (-.36) is more 
negative than in Model 3 (-.23). This suggests that the method variance may 
have influenced the earlier estieate of the GM/GF correlation in Model 3. 

Model 3 (Figur* ID) coebines the BMF factor posited in Model 2 and the 
6M and GF factors posited in Model 3. In Model S correlated residuals are not 
posited. Model S provided three well-defined higher-order factors and 
produced a substantially improved fit (TLI-.B30). In Model 6 (Figure IE), the 
five correlated residuals used to infer method effects were added and the fit 
improved modestly. The TLI (.866) for Model 6 is reasonable and the same as 
that of Model 1, indicating that eost of the covariation aeong the first- 
order lectors in Model 1 cen be explained by Hodvl 6. BeceuM Model 6 
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raquirM 19 fewer parameter estinates than Hodel 1, Hodel 6 is more 
parsieonious than Hodel 1. 

A aore detailed inspection oi parameter estimates in Model 6 (Table 5) 
facilitates the interpretation of these higher-order factors. Four of five M 
factors — all but the CPS — load positively and significantly on GH; three 
of five F factors - all but CPS and CPI - load positively and 
significantly on GF; all 3 H factors load significantly and positively or 
6MF and all 3 F factors load significantly and negatively on GMF. Thus the 
higher-order factors are well-defined. 

The CPS H and F factors that earlier analyses showed to represent a 
bipolar factor load almost exclusively on GMF. The CPI was also designed to 
measure a bipolar MF, and the CPI M and F factors tend to have higher 
loadings on GMF than on GM or GF. The PAQ and BSRI were designed to measure 
distinguishable M and F factors with socially desirable items, and factors 
from these two instruments tend to have higher loadings on GM and GF than on 
GMF. The ANDRO H and F factors were also designed to infer distinguishable M 
and F factors, but they load more substantially on the GMF factor than on 
the GM and GF factors. However, the ANDRO M and F scales tend to be 
negatively correlated with social desirability (Table 1). This suggests that 
the GM and GF factors in Hodel 6 may reflect primarily the socially 
desirable aspects of the masculine and feminine stereotypes. Consistent with 
this interpretation but in contrast to earlier models, the correlation 
between GM and GF is positive (.34) instead of negative as in Models 2-4. 

Insert Table 5 About Here 
In Hodel 6 three highar-ordor trait factors were posited, and 
correlated residuals were used to assess method effects. The addition of the 
correlated residuals in this model had a much smaller effect (Model 6 vs. 3) 
than i^en only two higher-order trait factors were estimated (Model 4 vs. 
Hodel 3). This suggests that much of what initially appeared to be error due 
to method effects can be explained in terms of the three higher-order tr.ifc 
factors. The results have important jtheoretical implications in that they 
provide support for bfilb the bipolar GMF posited by traditional personality 
theorists and the separate GM and GF factors posited in androgyny theory. 

Despite the intuitive appeal of the interpretation of Model 6, there 
are also problems. First, the correlation between the residuals for the CPS 
H and F factors is larger than the residual variance for either factor. 
Thi» problM MS demonstrated in Table I when the correlation bfftwMn H and 
F was warm negative than -l.O after correction for attenuation, antf Nae aleo 
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found in a CFA study baswJ on the noraativt data base for the CPS instrument 
(Marsh, 1985). Thus, the problee is not specific to this model or even to 
this data. Using item-level data, Marsh (1985) demonstrated that this 
•ituation Mas due to the fact that CPS M items were logically opposed to CPS 
F items. Using opposites forced the correlation between the M and F scores 
to be more negative than Nould be expected from the internal consistency of 
responses Mithin each scale. Second, the residual variance term for the PAQ 
« factor is slightly negative. Since this offending parameter is not 
significantly different from zero and its standard error is not excessive, 
this Heywood case is apparently due to sampling error. CThe residual variance 
is the amount of variance in a first-order factor that is unexplained in 
terms of second-order factors and small residuals mean that a first-order 
factor is well -explained by higher-order factors]. These problems, though 
apparently not serious, dictate caution in interpreting the results. 

In summary, three higher-order traits are defined by the set of f-ve MF 
instruments. One factor is clearly identified as the bipolar GTF posited in 
traditional personality instruments such as the CPS. HoMever, the reasonably 
distinguishable facets of GF and GM posited by androgyny theory are also 
clearly evident. The pattern of loadings and the positive correlation 
between GM and 6F suggest that these higher-orler traits are inferred from 
socially desirable attributes that are relatively unique to masculine and 
feminine stereotypes, and this interpretation also appears to be consistent 
with androgyny theory. Further tests of the construct validity of these 
interpretations will be considered in the next section. 
£ EtftPT - f l IhsiL RlIiUBQ £Q QibSr Constructs, 

The purpose of this section is to examine relations between the higher- 
order MF factors, social desirability, esteem, and other constructs. This is 
accomplished by adding measures of these neM constructs to models considered 
in the last section. These relations between the previously identified 
factors and thmse new constructs are used to test the construct validity of 
earlier intsrprmtations of GH, GF and GMF. The nature of these tests fs 
discussed in more detail as part of the presentation of the results. Because 
the relations between MF responses and these constructs are not the major 
focus of the present investigation and were examined in detail by Harsh, 
flntill and Cunningham (1987) using the same data, the results arc considered 
only briefly here. 

Insert Table 6 About Here 



SflSiil tfmrrttUtya In the first pair of analyses, a social 
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dMirability factor (inferred fro« the 6 parcels, 3 from each of the two 
•ocial desirability measures) was added to the 10 MF factors considered 
earlier. In one such model, social desirability was related to GM and GF 
(i.e., a GMF was not posited). As demonstrated in Table 1 for raw scale 
scores, the social desirability factor was positively correlated with BSRI 
and PAQ responses, relatively uncorrelated with CPS and ANDRO responses, and 
negatively correlated with CPI responses. Social desirability was also 
substantially more positively correlated with GF than with GM (Table 6). 

When GM, GF, and GMF were posited, social desirability was more 
positively correlated with both Gt*. and GF, but relatively uncorrelated with 
GMF. These observations are consistent with earlier interpretations 
suggesting that GM and GF represented primarily the socially desirable 
aspects of M and F when GM, GF and GMF were included in the same model. ^ 

iStgsa^ In the second pair of analyses, an esteem factor (inferred from 
the 6 parcels, 3 from each of the two esteem measures) was added to the 10 
MF factors considered earlier. As demonstrated with the raw scale scores in 
Table 1, esteem was substantially more positively correlated with M than 
with F for each of the MF instruments. When just two higher-order factors 
(GM and GF) were posited (see Table 6), the GM/esteem correlation (.69) was 
very large and positive whereas the GF/esteem correlation was small and 
negative (-.14). However, when three higher-order factors were posited, 
esteem was positively correlated with GM (.33), GF (.29) and GMF (.43). This 
is consistent with the suggestion that the GM and GF factors reflect 
socially desirable aspects of M and F. 

Insert Table 6 About Here 
eiOlSfli£il fleodBT gQd tbg idigfitivgs :i§ag£yliOg: iQd "feminine" t In 
order to further test the construct validity of interpretations of the 
higher-order MF factors, gender and responses to the adjectives "masculine" 
and "feminine" (items from BSRI) were added to models with two higher-order 
(GM, GF) factors and to models with three higher-order (GH, 6F, GMF) 
factors. Biological gender (l^Mle, 2»female) is a bipolar construct, and 
other resaarchers (e.g., Pedhauzur and Tetenbaum, 1979) have reported that 
the adjectives "masculine- and "feminine" from the BSRI define a two-item 
bipolar factor. Support for the earlier interpretation requires that each 
of these new variables should correlate in the appropriate direction with 
the three higher-order factors, but that each should correlate substantially 
•ore Mith 6»F than with either GH or 6F. 

Iti«i ji»t tMo higher-order (6H, 6F) factors «r« potit«d» Mological 



ERIC 



18 



Masculinity and Feaininity 16 
gender, the adjective "masculine", and the adjective -feminine- are each 
correlated in the expected direction with GM and 6F (Table 6). When three 
higher-order (BM, BF, GMF) factors are considered. Gender (l=«ale, 2=female) 
is positively correlated with GF and negatively correlated with GM. Gender, 
however, is substantially itore related to GMF than to either GM or GF. The 
single-item factor defined by responses to the adjective -masculine" is 
positively correlated with GM and negatively correlated with GF, but it is 
much more substantially correlated with GMF. The single-item factor defined 
by responses to the adjective "feminine" is positively correlated with GF 
and negatively correlated with GM, but again its largest correlation is with 
GMF. Because these additional constructs are bipolar constructs and they 
correlate more substantially with bipolar GMF than with GM or GF, the 
results support the construct validity of interpretations of the three 
higher-order factors. 

gjHaiOition of the riF factor structure within^ across^ and between gender 

Parameter estimates for CFA models can be examined for responses by men 
and women separately, for the total group covariance matrix, or for the 
pooled within-group covariance matrix that removes the effect of gender. 
Because there are substantial gender differences in responses to M and F 
scales, each approach is likely to result in different parameter estimates. 
Theoretical, empirical, and pragmatic considerations led to the decision to 
focus on the total group covariance matrix in the present investigation. The 
purpose of discussion and results presented in this section is to further 
•KMine the basis of this decision and its implications. 

Are the factor structures underlying responses to HF responses similar 
for men and Momen? A host of theoretical and philosophical issues relate to 
this question, but the present focus is on methodological issues. Comparing 
responses by men and iMMn requires at least certain aspects of the factor 
structures to be equivalent, and pooling responses across groups requires 
even aore stringent assuaptions (e.g., Cole k HaxNell, 1983). Nhereas 
exploratory factor analysis is generally inappropriate for examining issues 
of factorial invariance, CFA is ideally suited to this purpose (see Marsh, 
1983). Kith multi group CFA, the equivalence of any one or any set of 
paraMter estimates across groups can be tested^ and hierarchies of nested 
•odels have been proposed for this purpose (e.g., AlNin ti Jackson, 19B1| 
Cole t. HaxMell, 1983| Joraskog k Sorboa, 198l| Herih tr Hoc«v«r, 1983). The 
Mst gwMral test, no aattar Hh«t th« hypothMisid indel, it a test of th« 
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equality of the entire variance/covariance eatrix across grcxips. This test, 
Box's n (see Cole & Haxwell, 1985), can be conducted using the MANOVA 
procedure in SPSSx (1986) which also creates a pool ed-wi thin group 
covariance matrix. The logic oi thir test is that so long as the covariance 
Matrices are equivalent, then structures based on inore restrictive models 
Mill also be equivalent. 

Two different tests of the equality of covariance matrices based on 
responses by nen and by Moaen were conducted. First, the equivalence of the 

30x30 covariance matrices representing the 30 rtF parcels derived from all 5 

2 

«F instruments was tested. The X of 510 (df=465, N=237, p> .05) was not 
significant, thus supporting the equivalence of the covariance matrices. 
Second, the equivalence of the 42x42 covariance matrices representing the 30 
HF parcels, the 6 estee* parcels, and the 6 social desirability parcels was 
tested. Again, the X of 974 (df=903, N=237, p > .05) was not statistically 
significant, thus supporting the equivalence of these expanded covariance 
Matrices. The omnibus nature of these tests (i.e., the simultaneous test of 
a large number of parameters), the modest sample sizes, and the use of 
nonsignificant statistical tests as a basis of support for a null hypothesis 
all dictate caution in the interpretation of the finding. Nevertheless, the 
findings provide a reasonable basis for pooling responses by men and women 
in subsequent analyses. 

A second issue is whether analyses should be performed on the pooled 
within-group covariance matrix that removes the effect of gender, or on the 
total group covariance matrix that includes the effect of gender. It is 
well-known that spurious correlations can result when groups differing on 
some irrelevant variable are combined. However, the effect of gender can 
hardly be considered an irrelevant variable in the study of HF. To the 
extent that gender is a valid source of variance to HF responses, then it is 
theoretically appropriate to conduct analyses on the total group covariance 
■atrix, as in the present investigation. However, because it is also 
relevant to know how gender affects the HF factor structure, additiohaf 
analyses were conducted on the pooled within-group covariance matrix for 
selected models (those results designated by H6 in Tables 2-5). The 
comparison of results of the same model fit to these two matrices — the 
total group and the pooled within-group covariance matrices — indicates the 
effect of gender on the HF factor structure. 

The first-order (model 1) model posited 10 HF factors to fit responsM 
to th» 30 »F parcels. This aodel mas Ht to both th« totel-y-oup covarlMc* 
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Matrix (Model 1) and the pooled within-group covariance eatrix in which the 
effect of gender is reiwved (Hodel la). Whereas the factor structure is 
Mell-defined in both analyses (Tables 2-4), there are important differences. 
The factor loadings are systematically smaller for the pooled Hithin-group 
analysis (Table 2.. Si ice some of the variance in the M and F factors is 
related to biological gender, partialling out the effect of gender reduces 
the variance in M and F factors. 

There are also systematic differences in the correlations among the 
factors for the two analyses (Table 3). Correlations among the different 11 
factors and among the different F factors are smaller for the pooled wi thin- 
group analyses. Hence, partialling out the effect of gender reduces the 
apparent agreement among the different MF instruments. Also, correlations 
between H factors and F factors are less negative in the pooled within-group 
analyses. Hence, partialling cut the effects of gender reduces the apparent 
''bi polarity* of MF responses. 

Selected higher-order modsls were also fit to tne within-group 
covariance matrix (Table 4). The fit of these models is somewhat better, 
partly because there is less covariance to be explained when the effect of 
gender is removed. The comparison of the relative fit of the models again 
supports the inclusion of 6M, GF, 6MF to represent trait effects and 
correlated errors to represent method effects (Hodel &a). When just GM and 
6F are posited, the negative GH/GF correlation observed in Model 4 is close 
to zero irfien based on the pooled within-group covariance matrix (Model 4a). 
Mhen GM, GF, and GMF factors are positad, factor loadings for GM and GF are 
little affected, but factor loadings for GMF are generally smaller for the 
analysis of the pooled Mi thin-group matrix (Table 9). These results suggest 
that although factor structures are similar for both analyses, removing the 
effect of gender reduces the apparent bi polarity of MF responses. 

The observed pattern of differences between analyses based on total - 
group and pooled within-group covariance matrices is not surprising. In 
fact, the construct validity of the MF responses would be suspect if such a 
pattern had not occurred. Thm results do, however, demonstrate the 
implications of this important methodological consideration. As described 
earlier, the theoretical pc/sition taken here is that the total group 
analyses are appropriate because gender is a valid source of variance to MF 
responses. From this pmrspmctive, results presented here support the 
construct validity of the MF responses. 

Thm finding that Mparatm covarianra Mtricmm basmd on responses by man 
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and by NOMn are not significantly different has important methodological 
implications that Mere emphasized here. The substantive implications of 
these findings may, hoMever, be even more important. First, the finding 
implies that the factor structure underlying responses to a diverse set of 
NF instruments do not differ significantly for men and women. Second, the 
finding implies that the relations of MF responses to esteem and social 
desirability do not differ significantly for men and Momen. 

Pisgussion 

lbs Structure of MF Responses 

The most salient distinction between androgyny and traditional 
approaches to the study of MF has been the proposed structure of MF. 
Previous research has focused on the choice between distinguishable M and F 
traits posited by androgyny theory, and the single bipolar MF trait posited 
by traditional personality approaches as if the two models Mere mutually 
exclusive. It is clear, however, that MF instruments can be constructed so 
as to produce either bipolar or relatively independent traits. For example, 
the M and F traits measured by the BSRI and PAO may be more accurately 
designated as measures of assert iveness /dominance and of nurturance 
respectively (Spence, 1984), and these traits are relatively independent. 
Furthermore, the use of just socially desirable items on the BSRI and PAQ is 
likely to produce MF correlations that are more positive than scales that 
are balanced in relation to social desirability. In contrast, items strongly 
linked to gender (as on the CPI) or logically opposed items (as on the CPS) 
will produce a much mr^e negative MF correlation. From this perspective an 
important substantive contribution of the present investigation is the 
demonstration that three higher-order MF factors are needed to explain 
responses to the five MF instruments. In contrast to previous 
demonstrations that sought to contrast one-factor (EMF) and tMO-f actor (GH 
and eF) structures, the present results clearly identified all three (GH, 
GF, and GHF) factors. Thus the results provide support for both the 
androgyny and the traditional perspectives. 

The idea that GM, GF, and G«F all exist simultaneously may be novel, but 
the empirical support for this contention has been found previously. Three 
orthogonal factors similar to the ones found here Mere reported by Pedhauzer 
and Tetenbaum (1979) in their factor analysis of BSRI responses. Their M and 
F factors were defined by socially desirable masculine and feminine 
characteristics Mhereas their bipolar MF factor Mas defined by the adjectivee 
"MKuliiw" «nd "ftMiniiw." Since the -aasculine" end -feelniiw" edjectlvw 
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had such strong face validity they interpreted the findings to mean that the 
M and F factors may lack validity. Instead, the present results suggest that 
all three factors represent distinct components of the MF construct. 
I!!S 5sLili2Q 2i t o Gender i 

Gender is consistently related to M in one direction and to F in the 
opposite direction. Thus it is no surprise that removing the effect of 
gender reduces the variance in both M and F, and also makes the MF 
correlation less negative or more positive. The position taken here is that 
this variance attributable to gender is a valid source of variance in HF 
response. Perhaps a more neutral position is that it is a source of 
variance that needs to be considered. This is relevant, because variance 
attributable to gender is typically eliminated Hhen researchers conduct 
separate analyses on responses by men and by Nomen. The decision to use 
separal^e group covariance matrices is often based on assumed differences in 
the factor structure for responses by men and women. However, the present 
investigation provided support for the invariance of the factor structures, 
and this is a substantively important finding. Nevertheless, it must be 
emphasized that even when within-group factor structures are equivalent, 
this wi thin-group factor structure will differ systematically from the total 
group factor structure. 

This methodological issue also has important implications for other 
personality research that examines factor structures within, across or 
between subgroups that are not amenable to random assignment (e.g., sex, 
race, SES, age). Typically there is no a priori basis for concluding that 
any one approach is necessarily superior. As demonstrated here, the best 
approach is to compare the empirical and theoretical implications of the 
different approa«:'>es. In pursuing this comparison, an omnibus test of the 
equality of subgroup covariance matrices such as Box's tl is a useful 
starting point. When there is support for this equality, subsequent analyses 
of either the total-group or poolc^d within-group covariance matrix is 
justified and the comparison of both approaches is recommended. When the 
omnibus test indicates that the subgroup covariance matrices are not 
equivalent, further analyses can be conducted to determine what aspects 
(e.g., factor loadings, factor correlations, uniquenesses) of the first- 
order or second-order factor structures differ (see Alwin tt Jackson, 1981; 
Harsh tc Hocevar, 198S) . 
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Marsh (1987a) and Marsh and Hoc9var (1995) usa this relation between a 
higher-order factor model and its cof responding lirst-order factor model to 
define the target coefficient that is used as a goodness-of-f it indicator ii 
Table 4. 
2 

In supplemental analyses, the effects of social desirability were 
partialled out of MF responses (see Appendix I). This made MF correlations 
•ore negative for PAO, BSRI and CPI, but had almost no effect for ANDRO and 
CPS. Correlations aeong M factors, and correlations among F factors, were 
soeetihat higher nhen the effect of social desirability was removed. Because 
these correlations are the convergent validities in HTMM analyses, these 
results are consistent with earlier suggestions that social desirability 
acts like a method effect. Partialling out social desirability also 
substantially reduced the effect of introducing correlated uniquenesses. 
Thus, much, but apparently not all, of the method effects were associated 
with the social desirability factor. 
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TabU 2 

Th* First-order Model For Total Group (T6) and Pooled Wi thin-group (WG) 
Covariance Matrices (Models 1 and la in Table 4): Factor Loadings and 
Error /Uni quenesses 



a b 
Factor First-order Error/ 
Variable Factor Loadings Uniquenesses 



TG WG TG UG 







c 


c 






ni 


BSRIMl 


• 74 


.63 


.21t 


. 18t 




BSRIH2 


• 661 


.631 


.21t 


.20» 




BSRIM3 


.67t 


.64t 


.14* 


. 14« 






c 


c» 






CI 

rl 


BCD TCI 

oSKlr 1 


. 42 


.36 


.23t 


. 23t 




BSRIF2 


.731 


.59» 


.07» 


.07» 




BSRIF3 


.371 


.381 


.19» 


.17» 


nz 


Lrinl 




. ZoC 


. 171 


. 16t 




CPIM2 


.39» 


.291 


. 16t 


. 17» 




CPIM3 


.34t 


.26t 


.20t 


.211 




Lrlr 1 


. 31c 


• 23c 


. IBt 


. 19t 




CPIF2 


.31t 


.271 


. 19t 


. 19» 




CPIF3 


.48t 


.471 


.21t 


.19» 


MX 

no 


OATIMI 

rHUni 


c 

• 64 


c 

. 62 


. 17t 


. 17t 




PAaM2 


.701 


.68t 


. iSt 


. 18t 




PAQM3 


.59* 


.S6t 


.20t 


.201 




r HUr 1 


c 


c 

TO 
. OT 


.311 


. 291 










. IZ* 






PAQF3 


.661 


.64« 


.lot 


.lit 


M4 


ANDROMl 


c 

.14 


c 

.13 


.021 


.021 




ANDR0n2 


.141 


.121 


.021 


.02« 




ANDR0n3 


.181 


.17» 


.021 


.021 


F4 


ANDROFl 


c 

. 12 


c 

.12 


.021 


.02* 




ANDR0F2 


.11* 


.lit 


.021 


.Olt 




ANDR0F3 


.141 


.lit 


.021 


.021 


«5 


CPSMl 


c 

.67 


c 

.55 


.691 


.691 




CPSM2 


.801 


.661 


.64« 


.64« 




CPSri3 


.691 


.48« 


.821 


.811 


F5 


CPSFl 


c 

.85 


c 

.77 


.77» 


.75» 




CPSF2 


.76» 


.551 


.66« 


.65t 




CPSF3 


.801 


.621 


.84« 


.821 



VIS%M^ Paraeeter estimates are in standardized form to facilitate 

interpretation. Factor correlations are presented in Table 3. 

t p < .OS. 
a 

The Measured variables were three randomly formed subscales from each 
M and F scale (e.g., BSRIHl, BSRIH2, BSRMFS are the three H subscales 
from the BSRI that define HI). Because each subsccle was allowed to 
define only one factor, factor loadings are presented as a single column 
instead of as a 30 (measured variables) by 10 (factors) matrix. 
L ror /uni quenesses were estimated in a diagonal 30 (variables) x 30 matrix 
that assumed uncorr elated errors among the variables, and so are presented 
as a column. The first factor loading for each factor was fixed at 1.0 to 
smrve as a reference indicator and so no test of statistical significance 
was performed. 
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VIS%M^ Paraeeter estimates are in standardized form to facilitate 

interpretation. Factor correlations are presented in Table 3. 

t p < .OS. 
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The Measured variables were three randomly formed subscales from each 
M and F scale (e.g., BSRIHl, BSRIH2, BSRMFS are the three H subscales 
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TabU 3 

Thm First-order Hodel For Total 6roup (TG) and Pooled Wi thin-group (MG) 

-?!flif25f_!Ifl[i5f!_l!!?^f*! * ^* ^" Factor Correlations 

?SRI CPI PAQ ANDRO CPS 

51 PI M2 F2 «3"''f3'_ M4 'r' H5 F5 

Ml T6 la 
MG 1 

Fl TG -.181 1 
MG .05 1 

M2 TG .52t -.27* 1 
MG .371 .12 1 

F2 TG -.421 .451 -.09 1 
MG -.29$ .22$ .30$ 1 

M3 TG .97$ -.12 .49$ -.42$ 1 
MG -.97$ .03 .41$ -.36$ 1 

MG .19$ .81$ .07 .12 .33$ 1 

"*2r '^il'-f^l .59$ -.58$ .87$ -.12 1 
MG .76$ -.26$ .43$ -.47$ .86$ -.02 1 

TG -.26$ .71$ -.21$ .65$ -.24$ .60$ -.43$ 1 
MG -.11 .62$ .05 .54$ -.14 .58$ -.30$ 1 

^ £ '^l '-^l -$2! ■•!?! -SZ* -.22$ .72$ -.54$ 1 
MG .30$ -.33$ .32$ -.56$ .47$ -.09 .63$ -.41$ 1 

''^MG ''HI -irS''^* '1^1 -'til -Z** -.63$ .63$ -1.11$ 1 
!5 -Z^t .40$ .07 . 60$ -.39$ .12 -.51$ .50$ -1.16$ 1 



VBtSs. See footnotes in Table 2. 

$ p < .05. 
a 

In unstandardized forn the factor variances were: (.62, .22, .14, .17, 
.45, .35, .03, .02, .68, .85) for TG and (.41, .13, .07, .06, .38, 
.15, .02, .01, .31, .59) for MG. All factor variances were 
statistically significant. 
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Table 4 

HTMH HcNJels Positing Global Trait Factors or Hethod Effects To Explain 
Responses To Five MF Instruments 



2 

Hodel X df 




a 

TLX 


b 

TC 


GM/GF 

Correlation 


Total 


Group Analysis 










0 


4249.3 435 


9.77 











1 


783.2 360 


2.18 


.866 


1.000 


— 


2 


1459.6 395 


3.70 


.693 


.537 


— 


3 


1241.2 394 


3.15 


.755 


.631 


-.2Ztt 


4 


1054.8 389 


2.71 


.80S 


.743 


-.Zbtt 


5 


890.6 384 


2.32 


.850 


.879 


.35»» 


6 


825.2 379 


2.18 


.866 


.949 


.34»» 


Pooled Mi thin-Group Analysis 








Oa 


3620.0 435 


8.32 








la 


716.8 360 


1.99 


.865 


1.000 




4a 


907.5 389 


2.33 


.818 


.790 


.02 


6a 


822.3 379 


2.17 


.840 


.872 


.51»» 



Ngte^ Six substantive models (2.1 - 2.6) were fit to the Total &roup 
Covariance Matrix and three of these models (2.1a, 2.4a, 2.6a) were also fit 
to the pooled within-group covariance matrix. Parameter estimates for Models 
2.6 and 2.6a are presented in Table 7. See Appendix II for a description of 
the models. 
> p < .05. 

a b 

TLI = Tucker Lewis index (Bentler & Bonett, 1980). The Target 

coefficient (TC), designed specifically for HCFA (see Marsh & Hocevar, 
1985), is defined as the ratio of the X for the first-order model 
(Model 2.1) and any higher-order model. It provides an estimate of the 
variance in the first-order model that can be explained by the higher- 
order model. It has a maximum of 1.0 when all covariation among the 
first-order models can be explained by higher-order factors. 
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Tabls S 

HCFA nodels 6 (TG) and 6a (MG): Second-order Factor Loadings, First-order 
!?5l?L??!i^"!i!l_f!I^_E?CIl?if**°"* Between First-Order Factor Residuals 





Global 


M 


Factor 


TG 


NG 


BSRI HI 


.81> 


.841 


Fl 


0 


0 


CPI M2 


.29$ 


.431 


F2 


0 


0 


PAQ M3 


.891 


.871 


F3 


0 


0 


ANDRO m 


.52t 


.481 


F4 


0 


0 


CPS ro 


.OS 


.04 


F3 


0 


0 



r«tLar t.cMoing« tor: ^ 

'?7ZZZ7~Z ^7ZZ~7~Z^~~ Second-Order Correlated 

Global F Global HP Residuals Residuals 



TG_ MG 
0 0 



.bbt .771 

0 0 

0 -.03 -.11 

0 0 

.90$ .90$ 

0 0 

.41$ .45$ 

0 0 

.03 .02 



TG 


UG 


1 O 


mi 


To 


MG 


Sit 




. 10$ 


.09 










^ Aft 


• 181 


-.03 


-.01 


sit 




. oDI 


. 71 










.37$ 


.61 


.32$ 


.3S 


.52$ 


.56$ 


-.05 


-.06 






-.31$ 


-.24$ 


.10 


.14 


.12$ 


.10$ 


.78$ 


.78$ 


.12$ 


.15$ 






-.71$ 


-.64$ 


.33$ 


.38$ 


.07$ 


.10 


.89$ 


.78$ 


.21$ 


.39$ 






-.86$ • 


-.69$ 


.26$ 


.52$ 


-.34$ 


-.58$ 



Note. Factor loadings for the 30 Measured variables and~their 

error /uniquenesses are not shoMn because they are so sieil ar to those for the 

corresponding first-order aodels (Table 2). Paraaeter estiaates are in 

standardized for* to facilitate interpretation. 

$ P < .03. 
a 

Method effects in the MTMM models nere represented as correlated residuals 
betMeen pairs of M and F factors from the saae instrument. 
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Table 6 

Relations of Higher-order MF Factors (6M, GF, and GHF) to Social Desirability, 
Esteee, Gender, and Responses to the Adjectives ""nasculine" and "Feeinine" 



Models Containing: 





2 Higher -order 


3 Higher -order 






Factors 


Factors 






GH GF 


GH GF 




GHF 


Social Desirability 


.16t .49tt 


.42tt .58tt 


.16 


Esteea 


.69tt -. 14t 


.52tt .34tt 


.48tt 


Gender 


-.41tt .S^tt 


-.07 .l^tt - 


-.SBtt 


Hasculine 


.53tt -.S6tt 


.19tt -.27tt 


.63tt 


Fer inine 


-.47tt .64«t 


-.10 .34tt - 


-.67tt 



Ngte^ Factor correlations are based on two eodels like those described earlier 
except that indicators of additional constructs were added that Mere correlated 
Mith the higher-order HF factors. One eodel included only tmo higher-order 
factors (GH and GF) whereas the second contained three higher-order MF factors 
(Gn, GF, and GHF). In three sets of analyses, the higher-order factors were 
related to: (a) social desirability, (b) esteee, and (c) gender and responses 
to the adjectives ''easculine" and "feainine. " 
t p < .05. 
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Appandix I — HCFA nodcl Specifications in Teras of LISREL Design Matrices 

BeloM are the LISREL design matrices for Hodel 6 (Figure I). In this 
problea there are 30 eeasured variables (30 MF subscales called efl - •f30), 
10 first-order factors (Ml - MS; Fl - F3), and 3 second-order factors (6M, 
6F, 6MF) used to explain relations aeong the 10 first-order MF factors. The 
four design matrices contain parameters to be estimated (represented as 
letters a to g), and paraeeters with fixed values of either 0 or 1. LAMBDA Y 
is a 30 (eeasured variables) x 13 (factors) matrix that contains estimated 
factor loadings (the "a"s) and factor loadings with fixed values (the Is) 
that serve as reference indicators. THETA is a 30x30 matrix of uniquenesses 
(the -b-s) of the measured variables. THETA is specified as a diagonal 
matrix indicating that uniqueness ore uncorrelated, and thus is presented as 
a single column of values. BETA is a 134x13 matrix that contains second- 
order factor loadings (the -c-s). PSI is a 13x13 matrix that contains the 
residual variances for first-order factors (the -d"s), correlations among 
residual variances that are used to reflect method effects (the "e"s), 
second-order factor variances fixed to unity (the Is), and correlations 
among second-order factors (the "f"s). 

Other HCFA models can be easily represented in terms of the four design 
matrices. For example, Model 4 (Figure 1) differs from the one presented 
here only in that no only tMo higher-order factors (GM and GF) Here posited. 
For this model, LAMBDA Y is a 30x12 matrix (the last column is eliminated), 
BETA and PSI are 12x12 matrices (the last row and column are eliminated) and 
THETA remains the same. Nhcn the six indicators of esteem were added 
to Model e (see Table 6) LAMBDA Y became a 36x14 (reflecting the 6 
additional measured variables and 1 additional factor), BETA and FSI became 
14x14 matrices (reflecting the one additional factor), and THETA became a 
36x36 matrix (reflecting the one additional factor). 
Appendix I continued on next page. 
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Appendix I continued 
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Appendix I continued 
BETA 
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!!!sili. «'first-order factor loadings; b=error /uniquenesses for each acasured 
variable; c»second-arder factor loadings; d«f irst-order factor residuals; 
e=correlated residuals aaong first-order factors used to reflect eethod 
effects; f=car relations aoong second-order factors. 
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Figure Captions 

Figure 1. Hierarchical models of the structure of responses to all five 
masculinity-femininity (MF) instruments. Each of the five models posits 
one (6MF=global bipolar MF), two (GM^^global masculinity and GF^global 
femininity) or three (GH, GF, and GMF) second-order factors. The second- 
order factors reflect relations among the first-order M (M1-M5) and F (Fl- 
F5) factors. Each pair of first-order factors (e.g., Ml and Fl) 
represents responses to one of the f ive MF instruments. (The relations 
between each first-order factor and its three measured variables are not 
shown in detail so as to simplify the diagrams.) Two of the models (4 and 
6) also contain correlated residuals; these reflect method/halo effects 
that are idiosyncratic to the pair of first-order factors representing the 
same MF instrument. The hierarchical structures are presented in terms of 
LISREL design matrices in Appendix I. 
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